This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

• TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 

IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



(19) 




Europais h s Patentamt 

European Patent Offic 

Of tic europeen des brevet 




(11) 



BP © 7511 458 AH 



(12) 



EUROPEAN PATENT APPLICATION 



(43) Date of publication: 

02.01.1997 Bulletin 1997/01 


(51) IntCI. 6 : G06F 9/38 


(21) Application number: 96480077.5 




(22) Date of filing: 31.05.1996 




(84) Designated Contracting States: 
DE FR GB 


(72) Inventor: Chan, Kin 

Austin, Texas 78758 (US) 


(30) Priority: 29.06.1995 US 496833 

(71) Applicant: INTERNATIONAL BUSINESS 
MACHINES CORPORATION 
Armonk,NY10504(US) 


(74) Representative: Schufffenecker, Thierry 
Compagnie IBM France, 
Departement de Propriet© Intellectuelle 
06610 La Gaude(FR) 



(54) Method and system for tracking resource allocation within a processor 



(57) A method and system are disclosed for tracking 
the allocation of resources within a processor having 
multiple execution units which support speculative exe- 
cution of instructions. The processor includes a re- 
source counter including a first counter and a second 
counter and a number of resources, wherein one or 
more of the resources are allocated to each of a number 
of instructions dispatched for execution to the execution 
units. In response to dispatching an instruction among 
the plurality of instructions to one of the execution units 
for execution, the first counter is incremented once for 
each of the resources allocated to the instruction, and if 
the instruction is a first instruction within a speculative 
execution path, the second counter is loaded with a val- 
ue of the first counter prior to incrementing the first coun- 
ter. In response to completion of a particular instruction 
among the number of instructions dispatched to one of 
the multiple execution units, the first and the second 
counters are decremented once for each resource allo- 
cated to the particular instruction. In response to a ref- 
utation of the speculative execution path, a value of the 
second counter is transferred to the first counter, such 
that the resource counter tracks a number of the plurality 
of resources allocated to the plurality of instructions. 
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D scription 

BACKGROUND OF THE INVENTION 

1. Technical Field: 

The present invention relates in general to an im- 
proved method and system for data processing, and in 
particular to an improved method and system for track- 
ing the allocation of resources within processor that sup- 
ports speculative execution of instructions. Still more 
particularly, the present invention relates to a method 
and system for tracking the allocation of resources with- 
in a speculatively executing processor which enable the 
processor to recover the state of resource allocation fol- 
lowing a mispredicted branch. 

2. Description of the Related Art: 

Designers of state-of-the-art processors are contin- 
ually attempting to improve performance of such proc- 
essors. Recently, processor designers have developed 
a number of architectural enhancements that have sig- 
nificantly improved processor performance over proc- 
essors utilizing conventional architectures. For exam- 
ple, Reduced Instruction Set Computer (RISC) proces- 
sors utilize reduced instruction sets that enable such 
processors to achieve low cycle-per-instruction (CP I) 
ratios. To further increase throughput, processors can 
also employ a superscaler architecture that enables 
multiple instructions to be issued and executed simulta- 
neously by a number of execution units. As a further en- 
hancement, execution units within a superscaler proc- 
essor can be designed to execute in a pipelined fashion 
in which each execution unit processes multiple instruc- 
tions simultaneously with one or more instructions at 
each stage of execution. Finally, state-of-the-art proc- 
essors are equipped to execute instructions in an order 
determined by the availability of execution units rather 
than by sequential programmed order. This so-called 
"out of order" execution enables a processor to maxi- 
mize the utilization of available execution unit resources 
during each cycle. 

In a typical pipelined superscaler processor that 
supports out- of-order processing, one or more instruc- 
tions are dispatched each cycle to a number of execu- 
tion units. The instructions are executed opportunistical- 
ly as execution unit resources become available with the 
caveat that the execution units must adhere to data de- 
pendencies between instructions. That is, if the execu- 
tion of a first instruction depends upon data resulting 
from the execution of a second instruction, the first in- 
struction must be executed prior to the s cond instruc- 
tion. After an execution unit has completed processing 
an instruction, the instruction is forwarded to one of the 
numb r of completion buffers within the superscaler 
processor. A completion (rename) buffer is a temporary 
buffer which holds an instruction until the instruction is 



completed by transferring the data associated with the 
instruction from temporary registers to architected reg- 
isters within the processor. 

Although instructions can execute in any order as 

5 long as data dependencies are observed, most proces- 
sors require that instructions are completed (i.e., data 
committed to architected! registers) in program order. 
One reason for the requirement of in-order completion 
is to enable the processor to support precise interrupt 

10 and exception handling. For example, when an excep- 
tion such as divide-by-zero arithmetic error occurs, an 
exception handler software routine will be invoked to 
manage the interrupt or exception . However, before the 
exception handler can be invoked, instructions preced- 
es ing the instruction which generated the exception have 
to be completed in program order for the exception han- 
dler to execute in an environment that emulates the en- 
vironment which would exist had the instructions been 
executed in program order. A second reason for the re- 

20 quirement of in-order completion is to enable proper re- 
covery of a prior context if a branch is guessed wrong. 
As will be appreciated by those skilled in the art, super- 
scaler processors typically include a branch execution 
unit, which predicts the result of branch instructions. 

2S Since the result of a branch instruction is guessed and 
instructions following the branch instruction reentry 
point are executed speculatively, the processor must 
have a mechanism for recovering a prior processor con- 
text Jf the branch is later determined to have been 

30 guessed wrong. Consequently, speculatively executed 
instructions cannot be completed until branch instruc- 
tions preceding the speculatively executed instructions 
in program order have been completed. 

In order to complete instructions executed out-of- 

35 order in program order, the processor must be equipped 
with facilities which track the program order of instruc- 
tions during out-of-order execution. In conventional su- 
perscaler processors which support out-of-order execu- 
tion, the program order of instructions is tracked by each 

40 of the execution units. However, as the number of exe- 
cution units and the number of instructions which may 
be executed out-of-order increase, tracking the program 
order of instructions burdens the performance of the ex- 
ecution units. Consequently, it would be desirable to 

46 provide an improved method and system for managing 
the instruction flow within a superscaler processor which 
enables instructions to be dispatched in-order, executed 
out-of-order, and completed in-order and which does not 
require that the execution units track the program order 

50 of instructions. 

A second source of performance problems within 
processors which support speculative execution of in- 
structions is the recovery of the state of proc ssor re- 
sou rc s following a mispredicted branch. Typically, 

55 proc ssors which support speculative xecution of in- 
structions include a branch history tabl (BHT) that en- 
ables a processor to predict the outcome of branch in- 
structions based upon prior branch outcomes. Thus, uti- 
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lizing data within the BHT, the processor will begin exe- 
cution of one or more sequential speculative execution 
paths which tollow branch instruction reentry points. In 
conventional processors which support speculative ex- 
ecution, once a branch is determined to be guessed 
wrong, the processor stalls the execution pipeline until 
all sequential instructions preceding the misguessed 
branch are completed. Once all valid data is committed 
from the rename buffers to architected registers, all of 
the rename buffers are flushed and reset. Thereafter, 
the processor continues execution and allocation of the 
rename buffers beginning with the sequential instruction 
following the alternative execution path. Although this 
recovery mechanism guarantees that all of the proces- 
sor resources will be available following a mispredicted 
branch, the conventional recovery mechanism de- 
grades processor performance since the processor 
must delay dispatching additional instructions and allo- 
cating rename buffer resources until all instructions pre- 
ceding the misguessed branch are completed. 

Consequently, it would be desirable to provide an 
improved method and apparatus within a processor 
which enable the processor to restore the correct state 
of processor resources once a speculative execution 
path is determined to be mispredicted. 

SUMMARY OF THE INVENTION 

It is therefore one object of the present invention to 
provide an improved method and system for data 
processing. 

It is another object of the present invention to pro- 
vide an improved method and system for tracking the 
allocation of resources within a processor that supports 
speculative execution of instructions. 

It is yet another object of the present invention to 
provide an improved method and system for tracking the 
allocation of resources within a speculatively executing 
processor which enable the processor to recover the 
state of resource allocation following a mispredicted 
branch. 

The foregoing objects are achieved as is now de- 
scribed. A method and system are disclosed for tracking 
the allocation of resources within a processor having 
multiple execution units which support speculative exe- 
cution of instructions. The processor includes a re- 
source counter including a first counter and a second 
counter and a number of resources, wherein one or 
more of the resources are allocated to each of a number 
of instructions dispatched for execution to the execution 
units. In response to dispatching an instruction among 
the plurality of instructions to one of the execution units 
for execution, the first counter is incremented once for 
ach of the resources allocat d to the instruction, and if 
the instruction is a first instruction within a sp culative 
ex cution path, the second counter is loaded with a val- 
ue of the first counter prior to incrementing the first coun- 
ter. In respons to completion of a particular instruction 



among th number of instructions dispatched to one of 
the multiple execution units, the first and the second 
counters are decremented once for each resource allo- 
cated to the particular instruction. In response to a ref- 

s utation of the speculative execution path, a value of the 
second counter is transferred to the first counter, such 
that the resource counter tracks a number of the plurality 
of resources allocated to the plurality of instructions. 
The above as well as additional objectives, fea- 

10 tures, and advantages of the present invention will be- 
come apparent in the following detailed written descrip- 
tion. 

BRIEF DESCRIPTION OF THE DRAWINGS 

is 

The novel features believed characteristic of the in- 
vention are set forth in the appended claims. The inven- 
tion itself, however, as well as a preferred mode of use, 
further objectives and advantages thereof, will best be 
20 understood by reference to the following detailed de- 
scription of an illustrative embodiment when read in con- 
junction with the accompanying drawings, wherein: 

Figure 1 illustrates a preferred embodiment of a da- 
2S ta processing system which utilizes the method and 
system of the present invention; 

Figure 2 depicts a block diagram of the system unit 
of the data processing system illustrated in Figure 1 ; 

30 

Figure 3 illustrates a block diagram of a preferred 
embodiment of a processor which employs the 
method and system of the present invention; 

35 Figure 4 depicts a more detailed block diagram of 
the instruction sequencing table (1ST) illustrated in 
Figure 3; 

Figure 5 illustrates a preferred embodiment of a 
40 counter which indicates a number of allocated en- 
tries within the instruction sequencing table depict- 
ed in Figure 4; 

Figure 6 depicts a preferred embodiment of a coun- 
45 ter which indicates a number of allocated floating- 
point rename buffers; 

Figure 7 illustrates a preferred embodiment of a 
counter which indicates a number of allocated gen- 
50 eral purpose rename buffers; 

Figure 8 depicts a flowchart of the operation of the 
instruction sequencing table during a dispatch cy- 
cle; 

55 

Figure 9 illustrat s a flowchart of the operation of 
the instruction sequencing labl during a finish cy- 
cle; and 
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Figure 10 depicts a flowchart of the operation of the 
instruction sequencing table during a completion 
cycle. 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENT 

With reference now to the figures and in particular 
with reference to Figure 1, there is illustrated a block 
diagram of data processing system which employs the 
method and system of the present invention. As illus- 
trated, data processing system 10 comprises system 
unit 12 and one or more local nodes 14, which include 
personal computer 16, display 18, keyboard 20, and 
mouse 22. As is well-known to those skilled in the art, a 
user inputs data to personal computer 16 utilizing key- 
board 20, mouse 22, or other suitable input device. The 
user may then process the data locally utilizing personal 
computer 16, or transmit the data from persona! com- 
puter 16 to system unit 12 or another node 14 utilizing 
well-known networking techniques. It is advantageous 
lor a user to send tasks to system unit 1 2 for execution 
since system unit 12 can execute tasks in a relatively 
short period of time compared to node 1 4. System unit 
12 and personal computer 16 output data to a user via 
display device 18. 

Referring now to Figure 2, there is depicted a block 
diagram of system unit 1 2, which in a preferred embod- 
iment ol the present invention comprises a symmetric 
multiprocessor computer, such as the IBM RISC Sys- 
tem/6000. System unit 12 includes one or more CPUs 
30, which each include an on-board level one (L1 ) cache 
32. Each CPU 30 is also associated with a level two (L2) 
cache 34. As will be understood by those skilled in the 
art, L1 caches 32 and L2 caches 34 each comprise a 
small amount of high-speed memory which store fre- 
quently accessed segments of data and instructions. If 
data requested by a CPU 30 is not resident within the 
L1 cache 32 or 12 cache 34 associated with CPU 30, 
the requested data is retrieved from main memory 36 
via system bus 38. 

System unit 12 also includes SCSI controller 40 and 
bus interface 46. SCSI controller 40 enables a user to 
attach additional SCSI devices 42 to system unit 12 via 
peripheral bus 44. Bus interface 46 provides facilities 
that enable multiple local nodes 14 to access system 
resources available within system unit 12. As will be ap- 
preciated by those skilled in the art, system unit 1 2 in- 
cludes additional hardware coupled to system bus 46 
that is not necessary for an understanding of the present 
invention and is accordingly omitted for simplicity. 

With reference now to Figure 3, there is illustrated 
a preferred embodiment of a CPU 30 in accordance with 
the method and syst m of the present invention. In the 
preferr d embodiment depicted in Figur 3, CPU 30 
comprises a superscaler processor that issues multiple 
instructions into multiple execution pipelines each cycle, 
thereby enabling multiple instructions to be executed si- 



multaneously. CPU 30 has fiv execution units 60-68, 
including fixed-point units 60 and 62, load-store unit 64, 
floating-point unit 66, and logical condition register unit 
68. 

5 According to the present invention, CPU 30 also in- 
cludes instruction sequencing table (1ST) 80, which en- 
ables CPU 30 to track trie execution of instructions by 
execution units 60-68 and to complete instructions in 
program order. Referring now to Figure 4, there is de- 
10 picted a block diagram of a preferred embodiment of 1ST 
80. As illustrated, 1ST 80 includes a number of entries 
1 1 0, which each contain a finish bit 1 1 2, exception code 
field 114, general purpose register (G PR) field 116, float- 
ing-point (FPR) register field 118, and branch bit 120. 
15 Entries 110 are addressed by one of 16 instruction IDs, 
which are each associated with an outstanding instruc- 
tion, that is, an instruction that has been dispatched, but 
not completed. 

With reference now to Figure 8, there is illustrated 
20 a flowchart of the operation of 1ST 80 during a dispatch 
cycle. As the process begins at block 200, instruction 
fetch address register (IFAR) 62 calculates the address 
of the next instructions to be fetched from instruction 
cache 54 based upon information received from pro- 
25 gram counter 104. The group of instructions specified 
by the address generated by IFAR 52 is loaded in par- 
allel into instruction buffer 56 and dispatch unit 58 from 
instruction cache 54. The process then proceeds to 
block 202, which depicts determining a number of avail- 
30 able entries 110 within 1ST 80. In a preferred embodi- 
ment of the present invention, the number of available 
entries 110 within 1ST 80 is easily determined from an 
1ST entry counter 130 (illustrated in Figure 5) within re- 
source counters 98 that counts the number of allocated 
35 1ST entries 110. In the preferred embodiment illustrated 
in Figure 4, up to three instructions can be dispatched 
during each cycle if sufficient entries 110 are available 
within 1ST 80. 

Next, at block 204, instruction buffer 56 reads out 
40 in program order a set of instructions for which 1ST en- 
tries 110 are available. Utilizing resource availability in- 
formation received from completion unit 88 and re- 
source counters 98, dispatch unit 58 enables selected 
ones ol execution units 60-68 to begin execution of in- 
45 structions for which resources, such as rename buffers 
90 and 92, are available. Each instruction dispatched 
from instruction bufler 56 is assigned one of the instruc- 
tion IDs specified by dispatch pointers 82. Since instruc- 
tions are dispatched in program order, entries within 1ST 
so 80 are allocated in program order. Thus, lor the stat of 
1ST 80 depicted in Figure 4, if only a single instruction 
were dispatched during a dispatch cycle, that instruction 
would be assigned the entry 110 associated with instruc- 
tion ID "1101 " and specified as dispatch instruction ID 1 
55 by dispatch point rs 82. 

The process then proceeds to block 206, which il- 
lustrates writing completion information into 1ST 80 for 
each instruction dispatched. Each instruction issued 
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from dispatch butter 56 is proc ssed by instruction de- 
code unit (IDU) 70. IDU 70 decodes each instruction to 
determine the register resources required to complete 
the instruction. Thus, by determining the type of each 
instruction, IDU 70 can determine the number of general 
purpose registers (GPRs) and floating-point registers 
(FPRs) required to store the data associated with the 
instruction. Once IDU 70 has determined the register re- 
sources required to execute an instruction, IDU 70 
writes the information into the appropriate entry 110 
within 1ST 80. Next, the process proceeds to block 208, 
which depicts determining which, if any, of the dis- 
patched instructions are speculative. If a dispatched in- 
struction is the first instruction within a speculative exe- 
cution path, the process proceeds to block 208, which 
depicts storing the dispatch pointer 82 (i.e., instruction 
ID) pointing to the entry allocated to the speculative in- 
struction as a backup pointer 84. Storing the instruction 
ID of the first instruction within each speculative execu- 
tion path enables CPU 30 to recover the correct execu- 
tion context if a branch is later determined to have been 
misguessed. 

The process proceeds from either block 208 or 
block 210 to block 212, which illustrates updating 1ST 
entry counter 130 and dispatch pointers 82. 1ST entry 
counter 130 is updated by 1ST control 100 which incre- 
ments or decrements 1ST entry counter 130 by the net 
number of entries 110 allocated during the cycle after 
taking into account both dispatched and completed in- 
structions. Dispatch pointers 82 are updated by incre- 
menting the instruction ID to which dispatch pointers 82 
point by the number of instructions dispatched during 
the cycle. Utilizing rotating pointers rather than a shifting 
queue enhances the performance of 1ST 80 since only 
dispatch pointers 82 are updated each cycle rather than 
everyentry 110. Thereafter, the process proceeds to 
block 214 where the process terminates. 

Referring now to Figure 9, there is depicted a flow- 
chart of the operation of 1ST 80 during a finish cycle. As 
is well known to those skilled in the art, each of execution 
units 60-68 is an execution pipeline having multiple 
stages, such as fetch, decode, execute, and finish, 
which can accommodate one or more instructions at 
each stage. Because execution units 60-68 operate in- 
dependently and because the number of cycles required 
to execute instructions can vary due to data dependen- 
cies, branch resolutions, and other factors, execution 
units 60-68 execute instructions out of program order. 
As illustrated, the process begins at block 230 and 
thereafter proceeds to block 232, which depicts 1ST 80 
receiving an instruction ID and finish report from execu- 
tion units 60-68 for each instruction finished during the 
cycle. The finish report includes an exception code 
which identifies the exception generated by xecution 
of the instruction, if any. The process then proce ds to 
block 234, which illustrates 1ST 80 writing the exception 
code received at block 232 into the exception code field 
114 of the entry 110 identified by the finished instruc- 



tion's ID. In addition, at block 234, finish bit 112 within 
entry 110 is set to indicate that the instruction has fin- 
ished execution. In a preferred embodiment of the 
present invention, up to six finish reports can be written 
5 to 1ST 80 during a finish cycle. Following block 234, the 
process terminates at block 236. 

With reference now to Figure 10, there is depicted 
a flowchart of the operation of 1ST 80 during a comple- 
tion cycle. As illustrated, the process begins at block 240 
10 and thereafter proceeds to block 242, which depicts 
completion unit 88 reading out instructions from 1ST 80 
that are indicated by completion pointers 86. As depict- 
ed in Figure 4, a preferred embodiment of the present 
invention maintains three completion pointers 86 that 
is specify instructions which can potentially be completed 
within a given processor cycle. The process then pro- 
ceeds from block 242 to block 244, which illustrates 
completion unit 88 determining which of the instructions 
read out at block 242 generated exceptions that have 
20 not been handled. Completion unit 88 determines if an 
instruction generated an exception by examining the ex- 
ception code field 114 associated with each instruction. 
If the first instruction (i.e. the instruction whose associ- 
ated entry 110 is specified as completion instruction ID 
25 1 by one of completion pointers 86) generated an ex- 
ception, the process proceeds from block 244 through 
block 246 to block 248, which depicts forwarding the first 
instruction to interrupt handling unit 102. As will be un- 
derstood by those skilled in the art, interrupt handling 
30 unit 102 calls an exception handling vector associated 
with the exception type specified by the exception code 
written within exception code field 114. Thereafter, the 
process proceeds from block 248 to block 254. 

Returning to block 244, if the first instruction read 
35 out from 1ST 80 did not generate an exception, the proc- 
ess proceeds from block 244 through block 246 to block 
249, which depicts determining which of the instructions 
read out at block 242 can be completed during the cur- 
rent cycle. In order to support precise interrupts, several 
40 constraints are placed upon the completion of instruc- 
tions. First, only instructions that are marked as finished 
within 1ST 80 by finish bit 112 can be completed. Sec- 
ond, instructions that generated an exception which has 
not been handled cannot be completed in the present 
45 completion cycle. Third, an instruction can be completed 
only if all instructions preceding the instruction in pro- 
gram order have already been completed or will becom- 
pleted during the current completion cycle. Finally, for 
an instruction to be completed, the requisite number of 
50 general purpose registers and floating-point registers 
must be available within general purpose register file 94 
and floating-point register file 96. Following block 249 
the process proceeds to block 250, which depicts com- 
pletion unit 88 completing instructions which satisfy th 
55 foregoing conditions by writing data associated with the 
instructions from GPR and FPR rename buff rs 90 and 
92 to GPR and FPR files 94 and 96. 

Thereafter, the process proceeds from block 250 to 
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block252. which depicts 1ST control 1 00 f re ing 1ST en- 
tries 110 that are associated with the instructions com- 
pleted at block 250. 1ST control 100 frees 1ST entries 
110 by incrementing each of completion pointers 86 
once for each instruction completed. Thereafter, the 
process proceeds to block 254 where the process ter- 
minates 

Referring now to Figures 5-7, there are illustrated 
block diagrams of 1ST entry counter 130, FPR rename 
buffer counter 150, and GPR rename buffer counter 
170, which together comprise resource counters 98. 
With reference first to Figure 5, 1ST entry counter 130 
includes multiplexers 132-137 and counters 138-142. 
According to a preferred embodiment of the present in- 
vention, counter 138 comprises a 17-bit shift counter 
which indicates in decoded format how many of the 16 
1ST entries 110 are currently allocated to outstanding 
instructions. Counter 1 38 is said to be in decoded format 
since the position of a set bit (a binary T) within the 
counter indicates the number of allocated entries 110. 
For example, when 1ST 80 is empty, only the least sig- 
nificant (left most) bit is set, indicating that 0 entries 110 
are allocated: if 1ST 30 is full . only the most significant 
bit is set. By storing the counters in decoded format rath- 
rthan utilizing a register which is incremented and dec- 
remented by adders, the present invention not only min- 
imizes the cycle time utilized to update counter 138 : but 
also minimizes the complexity of CPU 30 and the chip 
substrate area consumed. 

During each cycle 1ST control 1 00 computes the net 
change in the number of allocated entries 110 from the 
number of instructions dispatched and completed dur- 
ing that cycle. In a preferred embodiment of the present 
invention, the net change in the number of allocated en- 
tries 100 varies between +3 during cycles in which 3 in- 
structions are dispatched and 0 instructions are com- 
pleted to -3 during cycles in which 3 instructions are 
completed and 0 instructions are dispatched. 1ST con- 
trol 100 updates counter 138 to reflect the current 
number of allocated entries 110 by selecting the appro- 
priate update input to multiplexer 132, which in turn 
shifts the set bit within counter 1 38 a corresponding 
number of bit positions. Because an entry 110 is re- 
quired for each instruction dispatched, counter 138 pro- 
vides an interlock that prevents dispatch unit 58 from 
dispatching more instructions than can be accommodat- 
ed within entries 110 in 1ST 80. 

1ST entry counter 130 also includes backup buffer 
counter A 140 and backup buffer counter B 142, which 
comprise shift counters like counter 138. Backup buffer 
counter A 140 indicates a number of allocated 1ST en- 
tries 110 excluding instructions within a first speculative 
execution path. Similarly, backup buffer counter B#142 
indicates the number of allocated 1ST entries 110 ex- 
cluding instructions within a second speculative execu- 
tion path. As will be appreciated by those skilled in the 
art, embodiments of the present invention which support 
more than two speculative execution paths include one 



additional backup buffer .counter for each additional 
speculative execution path permitted. 

When the first instruction within a speculative exe- 
cution path is dispatched : 1ST control 100 enables the 

s select input to mux 1 33 to load the value of counter 1 38, 
which indicates the number of 1ST entries 110 allocated 
prior to dispatching instructions during the current cycle, 
into backup buffer counter A 1 40. In addition 1ST control 
100 selects the appropriate update input to mux 134 to 

10 update backup buffer counter A 1 40. For example, if the 
second and third instructions dispatched are specula- 
tive and 3 outstanding instructions are completed during 
the current cycle, 1ST control 100 selects the -2 update 
input. Asillustrated, counter 140 can be incremented by 

is a maximum of two entries since speculative instructions 
account for at least one of the three instructions which 
can be dispatched during the current cycle. During cy- 
cles while speculative execution path A remains unre- 
solved, 1ST control logic 100 selects the appropriate 

20 path A input of mux 1 34 to update backup buffer counter 
A 140 to reflect the reduction in allocated entries 110 
due to completion of outstanding nonspeculative in- 
structions. If speculative execution path A is resolved as 
guessed correctly, the contents of backup buffer counter 

2S A 1 40 are simply ignored. If, however, speculative exe- 
cution path A is resolved as guessed wrong, 1ST control 
1 00 enables the select input to mux 1 37 to load the value 
of backup buffer counter A 140 into counter 138. In ad- 
dition, 1ST control 100 selects the appropriate path A 

30 input to mux 1 32 to account for instructions completed 
during the current cycle. Thus, 1ST entry counter 138 
maintains a correct count of allocated entries 110 even 
in cases where branches are misguessed. 

As will be appreciated by those skilled in the art, 

35 mux 1 36 and backup buffer counter B 1 42 operate sim- 
ilarly to mux 134 and backup buffer counter A 140 to 
allow recovery from a second speculative execution 
path taken prior to the resolution of speculative path A. 
If speculative path A is resolved as correctly predicted 

40 and speculative path B (the second speculative execu- 
tion path) is resolved as mispredicted, 1ST control 100 
selects the appropriate input to mux 1 37 to load the val- 
ue of backup buffer counter B 142 into counter 138. In 
addition, 1ST control 1 00 updates counter 1 38 by select- 

45 ing the appropriate path B input to mux 132 to account 
for instructions completed during the current cycle. 

Referring now to Figure 6, there is depicted a block 
diagram of FPR rename buffer counter 150, which indi- 
cates the number of allocated FPR rename buffers 92. 

50 As is evident by inspection of Figure 6, FPR rename 
buffer counter 1 50 functions much like 1ST entry counter 
130. Backup buffer counter A 160 and backup buffer 
count r B 162 maintain a correct count of the number 
of allocated FPR rename buff rs 92 in cases wh re i- 

55 therof two branch instructions ar mispr dieted, ther by 
enabling FPR rename buff r counter 1 50 to r store the 
correct FPR buffer count to counter 1 58 in a single cycle. 
In the illustrated embodim nt, up to 3 of FPR rename 
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buffers 92 can be assigned to instructions and up to 3 
of FPR rename buffers 92 can be written to FPR file 96 
during each cycle. 

With reference now to Figure 7, there is illustrated 
a preferred embodiment of GPR rename buffer counter 
170, which counts the number of GPR rename buffers 
90 assigned to outstanding instructions. As will be ap- 
preciated by those skilled in the art, GPR rename buffer 
counter 170 operates similarly to FPR rename buffer 
counter 150, except for a difference in the number of 
GPR rename buflers 90 which can be allocated and re- 
tired within a cycle. In the depicted embodiment, up to 
two of GPR rename buffers 90 can be assigned to each 
instruction upon dispatch since two GPR rename buff- 
ers 90 are required to execute a "load and update" in- 
struction. However, only two of GPR rename buffers 90 
can be written to GPR file 94 during a given completion 
cycle. 

The design of FPR and GPR rename buffer 
counters 150 and 170 enhances the performance of the 
present invention as compared to prior art systems 
since resources allocated to mispredicted branches can 
be more quickly reallocated. Prior art processors which 
support speculative execution of instructions typically 
do not include facilities such as backup buffer counters 
A and B to enable the processors to recover the correct 
state of processor resources following a misguessed 
branch. In conventional processors which support spec- 
ulative execution, once a branch is determined to be 
guessed wrong, the processor stalls the execution pipe- 
line until all sequential instructions preceding the mis- 
guessed branch are completed. Once all valid data is 
committed from the rename buffers to architected reg- 
isters, all of the rename buffers are flushed and reset. 
Thereafter, the processor continues execution and allo- 
cation of the rename buffers beginning with the sequen- 
tial instruction following the alternative execution path. 
Although this mechanism is relatively efficient in terms 
of the circuitry required to recover from a misguessed 
branch, the recovery mechanism degrades processor 
performance since the processor must delay dispatch- 
ing additional instructions and llocating rename buffer 
resources until all instructions preceding the mis- 
guessed branch are completed. 

As has been described, the present invention pro- 
vides an improved method and system tor managing the 
flow of instructions through a superscaler processor 
which supports out-of-order execution. By maintaining 
an entry corresponding to each outstanding instruction 
within an instruction sequencing table, the present in- 
vention enables instructions executed out-of -program 
order by multiple execution units to be completed in or- 
der, thereby supporting precise interrupts. Furthermore, 
the present invention provid s an efficient m chanism 
for recovering from misgu ssed branches which ena- 
bles the recovery of both the program state and the re- 
source state of the processor prior to the misguessed 
branch. Although a processor which employs the 



present invention has b n described with reference to 
various limitations with respect to a number of instruc- 
tions which can be dispatched, finished, and completed 
during a given processor cycle, those skilled in the art 

s will appreciate that these limitations are merely design 
choices and do not serve limitations on the present in- 
vention. I 

While the invention has been particularly shown 
and described with reference to a preferred embodi- 

io ment, it will be understood by those skilled in the art that 
various changes in form and detail may be made therein 
without departing from the spirit and scope of the inven- 
tion. 

1S 

Claims 

1. A method for tracking the allocation of resources 
within a processor which supports speculative exe- 

20 cution of instructions, said processor having a plu- 
rality of execution units, a resource counter includ- 
ing a first counter and a second counter, and a plu- 
rality of resources, wherein one or more of said plu- 
rality of resources are allocated to each of a plurality 

2S of instructions dispatched for execution to said plu- 
rality of execution units, said method comprising: 

in response to dispatching an instruction 
among said plurality of instructions to one of 
30 said plurality of execution units for execution: 

incrementing said first counter once for each of 
said plurality of resources allocated to said in- 
struction; 

if said instruction is a first instruction within a 
speculative execution path, loading said sec- 
ond counter with a value of said first counter 
prior to incrementing said first counter; 

in response to completion of a particular in- 
struction among said plurality of instructions 
dispatched to one of said plurality of execution 
units, decrementing said first and said second 
counters once lor each resource allocated to 
said particular instruction; and 

in response to a refutation of said speculativ 
execution path, transferring a value of said sec- 
ond counter to said first counter, wherein said 
resource counter tracks a number of said plu- 
rality of resources allocated to said plurality of 
instructions. 



3S 



40 



45 



SO 



ss 2. The method for tracking th al location of r sources 
within a processor of Claim 1 , wherein said proces- 
sor comprises a sup rscalar processor capable of 
dispatching and completing multiple instructions 
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during each cycle, wherein said step of loading said 
second counter with a value of said first counterf ur- 
ther comprises: 

incrementing said second counter once for each of 
said plurality of resources allocated to nonspecula- s 
tive instructions among said plurality of instructions 
that are dispatched concurrently with an instruction 
which is a first instruction within a speculative exe- 
cution path. 

70 

The method for tracking the allocation of resources 
within a processor of Claim 1, wherein said proces- 
sor supports a second speculative execution path 
and said resource counter further includes a third 
counter, said method lurther comprising: is 

in response to dispatching a selected instruc- 
tion among said plurality of instructions to one 
of said plurality of execution units for execution, 
wherein said selected instruction is a first in- 20 
struction within a second speculative execution 
path, loading said third counter with a value of 
said first counter prior to incrementing said first 
counter; 

25 

in response to completion of a particular in- 
struction among said plurality of instructions 
dispatched to one of said plurality of execution 
units, decrementing said third counter once for 
each resource allocated to said particular in- 30 
struction: and 

in response to resolution of a first speculative 
execution path as correctly predicted and refu- 
tation of said second speculative execution 35 
path, transferring a value of said third counter 
to said first counter, wherein said resource 
counter tracks a number of said plurality of re- 
sources allocated to said plurality of instruc- 
tions. 40 

The method for tracking the allocation of resources 
within a processor of Claim 1 , said first and said sec- 
ond counters comprising first and second shift reg- 
isters, respectively, wherein each of said first and 45 
said second shift registers indicates a number of al- 
located resources among said plurality of resources 
by a bit postition of a set bit within said first and said 
second shift registers, wherein said step of incre- 
menting said first counter comprises shifting said so 
set bit in a first direction within said first shift register 
one bit position lor each of said plurality of resourc- 
es allocated to said instruction, and wherein said 
step of decrementing said first and said second 
counters compris s shifting said set bits within said ss 
first and said s cond shift registers in a second di- 
rection one bit position for each resource allocated 
to said particular instruction. 



5. An apparatus for tracking the allocation of resourc- 
es within a processor which supports speculative 
execution of instructions, said processor having a 
plurality of execution units and a plurality of resourc- 
es, wherein one or more of said plurality of resourc- 
es are allocated to each of a plurality of instructions 
dispatched for execution to said plurality of execu- 
tion units, said apparatus comprising: 

a resource counter having a first counter and a 
second counter; 

means for incrementing said first counter once 
for each of said plurality of resources allocated 
to said instruction in response to dispatching an 
instruction among said plurality of instructions 
to one of said plurality of execution units for ex- 
ecution; 

means for loading said second counter with a 
value of said first counter prior to incrementing 
said first counter in response to dispatching a 
particular instruction among said plurality of in- 
structions to one of said plurality of execution 
units for execution, wherein said particular in- 
struction is a first instruction within a specula- 
tive execution path; 

means for decrementing said first and said sec- 
ond counters once for each resource allocated 
to an instruction among said plurality of instruc- 
tions dispatched to said plurality of execution 
units in response to completion of a said in- 
struction; and 

means for transferring a value of said second 
counter to said first counter in response to ref- 
utation of said speculative execution path, 
wherein said resource counter tracks a number 
of said plurality of resources allocated to said 
plurality of instructions. 

6. The apparatus for tracking the allocation of resourc- 
es within a processor of Claim 5, wherein said proc- 
essor comprises a superscalar processor capable 
of dispatching and completing multiple instructions 
during each cycle, wherein said means for loading 
said second counter with a value of said first counter 
further comprises: 

means for incrementing said second counter once 
for each of said plurality of resources allocated to 
nonspeculative instructions among said plurality of 
instructions that are dispatched concurrently with 
an instruction which is a first instruction within a 
speculativ execution path. 

7. The apparatus for tracking the allocation of resourc- 
es within a processor of Claim 5, wherein said proc- 
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essor supports a second speculativ xecution 
path and said resource counter further includes a 
third counter, said apparatus further comprising: 

means for loading said third counter with a val- s 
ue of said first counter prior to incrementing 
said first counter in response to dispatching a 
selected instruction among said plurality of in- 
structions to one of said plurality of execution 
units for execution, wherein said selected in- 10 
struction is a first instruction within a second 
speculative execution path; 

means for decrementing said third counter 
once for each resource allocated to a particular is 
instruction among said plurality of instructions 
dispatched to said plurality of execution units in 
response to completion of said particular in- 
struction; and 

20 

means for transferring a value of said third 
counter to said first counter in response to res- 
olution of a first speculative execution path as 
correctly predicted and refutation of said sec- 
ond speculative execution path, wherein said 2S 
resource counter tracks a number of said plu- 
rality of resources allocated to said plurality of 
instructions; and 

said first and said second counters comprising 30 
first and second shift registers, respectively, 
wherein each of said first and said second shift 
registers indicates a number of allocated re- 
sources among said plurality of resources by a 
bit postition of a set bit within said first and said 35 
second shift registers, wherein said means for 
incrementing said first counter comprises 
means for shifting said set bit in a first direction 
within said first shift register one bit position for 
each of said plurality of resources allocated to 40 
said instruction, and wherein said means for 
decrementing said first and said second 
counters comprises means for shifting said set 
bits within said first and said second shift reg- 
isters in a second direction one bit position for 45 
each resource allocated to said particular in- 
struction. 

8. The apparatus for tracking the allocation of resourc- 
es within a processor of Claim 5, wherein said plu- so 
rality of resources comprise a plurality of rename 
data buffers utilized to store data associated with 
said plurality of instructions prior to completion; and 
wher in said proc ssor supports out-of-order xe- 
cution of said plurality of instructions and includes ss 
an instruction sequencing tabl having a plurality of 
entries, wherein each of said plurality of instructions 
is assigned one of said plurality of entries sequen- 



tially according to a program order of said plurality 
of instructions, such that said plurality of instruc- 
tions can be completed according to said program 
order, wherein said plurality of resources comprise 
said plurality of entries within said instruction se- 
quencing table. 

i 

9. A superscalar processor, comprising: 

a plurality of execution units, wherein instruc- 
tions dispatched to said plurality of execution 
units can be executed out of program order; 

a plurality of user-accessible data registers; 

a plurality of rename buffers; 

means for dispatching instructions to said plu- 
rality of execution units; 

means for assigning an instruction identifier to 
each of a plurality of instructions dispatched to 
said plurality of execution units for execution, 
wherein an instruction identifier is assigned to 
each of said plurality of instructions sequential- 
ly according to a program order of said plurality 
of instructions; 

a table having a plurality of entries, wherein 
each entry among said plurality of entries is as- 
sociated with an instruction identifier and con- 
tains a finish indicator that indicates whether 
execution of an instruction assigned an instruc- 
tion identifier associated with said each entry 
has finished; 

means for setting a finish indicator within a par- 
ticular entry among said plurality of entries with- 
in said table in response to termination of exe- 
cution of an instruction assigned to an instruc- 
tion identifier associated with said particular en- 
try; 

one or more pointers which point to entries with- 
in said table associated with instruction identi- 
fiers assigned to a subset of said plurality of in- 
structions that can possibly be completed dur- 
ing a particular processor cycle, wherein a se- 
lected instruction among said subset is com- 
pleted by transferring data associated with said 
selected instruction from associated ones of 
said plurality of rename buffers to selected 
ones of said plurality of data registers; and 

means for compl ting sel cted instructions 
within said subset of said plurality of instruc- 
tions, wherein exceptions generated by said 
selected instructions have been handled, 
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wherein instructions among said plurality of in- 
structions which are assigned instruction iden- 
tifiers preceding said selected instructions 
. have been completed during a previous proc- 
essor cycle or will be completed during the s 
same processor cycle, and wherein instruction 
identifiers assigned to said selected instruc- 
tions are associated with entries having set fin- 
ish indicators, such that said plurality of instruc- 
tions are completed according to said program 10 
order. 

The superscalar processor of Claim 9, wherein 
each entry within said table further comprises: 

15 

a field specifying a number of said plurality of 
data registers required to complete an instruc- 
tion to which an instruction identifier associated 
with said each entry is assigned; and 

20 

a field indicating exception conditions which oc- 
curred during execution of an instruction to 
which an instruction identifier associated with 
said each entry is assigned; 

25 

The superscalar processor of Claim 9, said super- 
scalar processor having M rename buffers and said 
further comprising: 

a rename buffer counter, including: 30 

a primary shift register having M+1 bits, where- 
in said primary shift register indicates a first 
number of said plurality of rename buffers allo- 
cated to instructions which are dispatched and 35 
uncompleted by a position of a set bit within 
said primary shift register; 

a backup shift register having M+1 bits, said 
backup shift register being associated with a 40 
speculative execution path, wherein said back- 
up shift register indicates a second number of 
said plurality of rename buffers allocated to in- 
structions which are dispatched and uncom- 
pleted and are not within said speculative exe- *5 
cution path, said second number being indicat- 
ed by a position of a set bit within said backup 
shift register; and 

means for transferring said second number 50 
from said backup shift register to said primary 
shift register in response to a determination that 
said speculative execution path was mispre- 
dict d; and 

55 

wh rein said superscalar processor supports N 
speculative execution paths, said data buffer 
counter further comprising N backup shift reg- 
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